source identification
Physics-informed sensor coverage through structure preserving machine learning
Shaffer, Benjamin David, Kinch, Brooks, Klobusicky, Joseph, Hsieh, M. Ani, Trask, Nathaniel
We present a machine learning framework for adaptive source localization in which agents use a structure-preserving digital twin of a coupled hydrodynamic-transport system for real-time trajectory planning and data assimilation. The twin is constructed with conditional neural Whitney forms (CNWF), coupling the numerical guarantees of finite element exterior calculus (FEEC) with transformer-based operator learning. The resulting model preserves discrete conservation, and adapts in real time to streaming sensor data. It employs a conditional attention mechanism to identify: a reduced Whitney-form basis; reduced integral balance equations; and a source field, each compatible with given sensor measurements. The induced reduced-order environmental model retains the stability and consistency of standard finite-element simulation, yielding a physically realizable, regular mapping from sensor data to the source field. We propose a staggered scheme that alternates between evaluating the digital twin and applying Lloyd's algorithm to guide sensor placement, with analysis providing conditions for monotone improvement of a coverage functional. Using the predicted source field as an importance function within an optimal-recovery scheme, we demonstrate recovery of point sources under continuity assumptions, highlighting the role of regularity as a sufficient condition for localization. Experimental comparisons with physics-agnostic transformer architectures show improved accuracy in complex geometries when physical constraints are enforced, indicating that structure preservation provides an effective inductive bias for source identification.
- North America > United States (0.46)
- North America > Mexico (0.04)
- Atlantic Ocean > Gulf of Mexico (0.04)
- Energy (0.68)
- Government > Regional Government (0.46)
Distributed Multi-robot Source Seeking in Unknown Environments with Unknown Number of Sources
Chen, Lingpeng, Kailas, Siva, Deolasee, Srujan, Luo, Wenhao, Sycara, Katia, Kim, Woojun
We introduce a novel distributed source seeking framework, DIAS, designed for multi-robot systems in scenarios where the number of sources is unknown and potentially exceeds the number of robots. Traditional robotic source seeking methods typically focused on directing each robot to a specific strong source and may fall short in comprehensively identifying all potential sources. DIAS addresses this gap by introducing a hybrid controller that identifies the presence of sources and then alternates between exploration for data gathering and exploitation for guiding robots to identified sources. It further enhances search efficiency by dividing the environment into Voronoi cells and approximating source density functions based on Gaussian process regression. Additionally, DIAS can be integrated with existing source seeking algorithms. We compare DIAS with existing algorithms, including DoSS and GMES in simulated gas leakage scenarios where the number of sources outnumbers or is equal to the number of robots. The numerical results show that DIAS outperforms the baseline methods in both the efficiency of source identification by the robots and the accuracy of the estimated environmental density function.
- Asia > China (0.14)
- North America > United States > Illinois (0.14)
- Europe (0.14)
Identifying the Source of Generation for Large Language Models
Large language models (LLMs) memorize text from several sources of documents. In pretraining, LLM trains to maximize the likelihood of text but neither receives the source of the text nor memorizes the source. Accordingly, LLM can not provide document information on the generated content, and users do not obtain any hint of reliability, which is crucial for factuality or privacy infringement. This work introduces token-level source identification in the decoding step, which maps the token representation to the reference document. We propose a bi-gram source identifier, a multi-layer perceptron with two successive token representations as input for better generalization. We conduct extensive experiments on Wikipedia and PG19 datasets with several LLMs, layer locations, and identifier sizes. The overall results show a possibility of token-level source identifiers for tracing the document, a crucial problem for the safe use of LLMs.
Classification of Inkjet Printers based on Droplet Statistics
Takenaka, Patrick, Eberhardinger, Manuel, Grießhaber, Daniel, Maucher, Johannes
Knowing the printer model used to print a given document may provide a crucial lead towards identifying counterfeits or conversely verifying the validity of a real document. Inkjet printers produce probabilistic droplet patterns that appear to be distinct for each printer model and as such we investigate the utilization of droplet characteristics including frequency domain features extracted from printed document scans for the classification of the underlying printer model. We collect and publish a dataset of high resolution document scans and show that our extracted features are informative enough to enable a neural network to distinguish not only the printer manufacturer, but also individual printer models.
- North America > United States (0.46)
- Europe > Germany > Baden-Württemberg (0.16)
- Health & Medicine (0.93)
- Energy > Oil & Gas > Upstream (0.46)
Knowledge-Powered Recommendation for an Improved Diet Water Footprint
Joshi, Saurav, Ilievski, Filip, Pujara, Jay
According to WWF, 1.1 billion people lack access to water, and 2.7 billion experience water scarcity at least one month a year. By 2025, two-thirds of the world's population may be facing water shortages. This highlights the urgency of managing water usage efficiently, especially in water-intensive sectors like food. This paper proposes a recommendation engine, powered by knowledge graphs, aiming to facilitate sustainable and healthy food consumption. The engine recommends ingredient substitutes in user recipes that improve nutritional value and reduce environmental impact, particularly water footprint. The system architecture includes source identification, information extraction, schema alignment, knowledge graph construction, and user interface development. The research offers a promising tool for promoting healthier eating habits and contributing to water conservation efforts.
- North America > United States > New York > New York County > New York City (0.05)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.05)
- Europe > Portugal > Porto > Porto (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Research Report (0.50)
- Overview (0.34)
- Health & Medicine > Consumer Health (1.00)
- Education > Health & Safety > School Nutrition (0.69)
Source Camera Identification and Detection in Digital Videos through Blind Forensics
Sameer, Venkata Udaya, Mukhopadhyay, Shilpa, Naskar, Ruchira, Dali, Ishaan
Source camera identification in digital videos is the problem of associating an unknown digital video with its source device, within a closed set of possible devices. The existing techniques in source detection of digital videos try to find a fingerprint of the actual source in the video in form of PRNU (Photo Response Non--Uniformity), and match it against the SPN (Sensor Pattern Noise) of each possible device. The highest correlation indicates the correct source. We investigate the problem of identifying a video source through a feature based approach using machine learning. In this paper, we present a blind forensic technique of video source authentication and identification, based on feature extraction, feature selection and subsequent source classification. The main aim is to determine whether a claimed source for a video is actually its original source. If not, we identify its original source. Our experimental results prove the efficiency of the proposed method compared to traditional fingerprint based technique.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Contaminant source identification in groundwater by means of artificial neural network
Secci, Daniele, Molino, Laura, Zanini, Andrea
In a desired environmental protection system, groundwater may not be excluded. In addition to the problem of over-exploitation, in total disagreement with the concept of sustainable development, another not negligible issue concerns the groundwater contamination. Mainly, this aspect is due to intensive agricultural activities or industrialized areas. In literature, several papers have dealt with transport problem, especially for inverse problems in which the release history or the source location are identified. The innovative aim of the paper is to develop a data-driven model that is able to analyze multiple scenarios, even strongly non-linear, in order to solve forward and inverse transport problems, preserving the reliability of the results and reducing the uncertainty. Furthermore, this tool has the characteristic of providing extremely fast responses, essential to identify remediation strategies immediately. The advantages produced by the model were compared with literature studies. In this regard, a feedforward artificial neural network, which has been trained to handle different cases, represents the data-driven model. Firstly, to identify the concentration of the pollutant at specific observation points in the study area (forward problem); secondly, to deal with inverse problems identifying the release history at known source location; then, in case of one contaminant source, identifying the release history and, at the same time, the location of the source in a specific sub-domain of the investigated area. At last, the observation error is investigated and estimated. The results are satisfactorily achieved, highlighting the capability of the ANN to deal with multiple scenarios by approximating nonlinear functions without the physical point of view that describes the phenomenon, providing reliable results, with very low computational burden and uncertainty.
Source Identification for Mixtures of Product Distributions
Gordon, Spencer L., Mazaheri, Bijan, Rabani, Yuval, Schulman, Leonard J.
We give an algorithm for source identification of a mixture of $k$ product distributions on $n$ bits. This is a fundamental problem in machine learning with many applications. Our algorithm identifies the source parameters of an identifiable mixture, given, as input, approximate values of multilinear moments (derived, for instance, from a sufficiently large sample), using $2^{O(k^2)} n^{O(k)}$ arithmetic operations. Our result is the first explicit bound on the computational complexity of source identification of such mixtures. The running time improves previous results by Feldman, O'Donnell, and Servedio (FOCS 2005) and Chen and Moitra (STOC 2019) that guaranteed only learning the mixture (without parametric identification of the source). Our analysis gives a quantitative version of a qualitative characterization of identifiable sources that is due to Tahmasebi, Motahari, and Maddah-Ali (ISIT 2018).
- North America > United States > California (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Europe > Netherlands > South Holland > Dordrecht (0.04)
- Africa > Benin (0.04)
Online Non-convex Learning for River Pollution Source Identification
Huang, Wenjie, Jiang, Jing, Liu, Xiao
In this paper, novel gradient based online learning algorithms are developed to investigate an important environmental application: real-time river pollution source identification, which aims at estimating the released mass, the location and the released time of a river pollution source based on downstream sensor data monitoring the pollution concentration. The problem can be formulated as a non-convex loss minimization problem in statistical learning, and our online algorithms have vectorized and adaptive step-sizes to ensure high estimation accuracy on dimensions having different magnitudes. In order to avoid gradient-based method sticking into the saddle points of non-convex loss, the "escaping from saddle points" module and multi-start version of algorithms are derived to further improve the estimation accuracy by searching for the global minimimals of the loss functions. It can be shown theoretically and experimentally $O(N)$ local regret of the algorithms, and the high probability cumulative regret bound $O(N)$ under particular error bound condition on loss functions. A real-life river pollution source identification example shows superior performance of our algorithms than the existing methods in terms of estimating accuracy. The managerial insights for decision maker to use the algorithm in reality are also provided.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > California > Nevada County > Truckee (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (8 more...)